Ego and Alter Coding Conventions

Matthew Chandler

NetSense

November 4, 2015

These are the conventions I used to code all egos and alters that appear in the NetSense data I cleaned and de-identified. These codes are consistent across the data files I have used, such that, for example, each study participant always has the same five-digit code whenever they appear in the demographic data, network survey data, or behavioral (communication events) data. Note that if a study participant appears as an alter in the behavioral or network survey data, the same consistent code is used. All other codes are similarly consistent across data files.

The codes were generated by random assignment. I used all available identifying information to match and disambiguate egos and alters across all data files.

(See the file “NetSense Data Manipulations.nb” for details.)

Three Digit

Known invalid nodes (eg, voicemail, Twitter SMS alert service)

Four Digit

Suspicious but not known invalid nodes (eg, unknown short codes, very long numbers)

Five Digit

Study participants

10000-89999: most participants

90000-99999: participants marked for exclusion because of incomplete data

(See Participant Canon for details.)

Six Digit

Non-participants

100000-199999: non-participant alters named in the network surveys

200000-899999: non-participant alters not named in the network surveys

900000-999999: reserved for exclusions, if needed